Aspheron Ridge
Machine Understanding of Scientific Language
Scientific information expresses human understanding of nature. This knowledge is largely disseminated in different forms of text, including scientific papers, news articles, and discourse among people on social media. While important for accelerating our pursuit of knowledge, not all scientific text is faithful to the underlying science. As the volume of this text has burgeoned online in recent years, it has become a problem of societal importance to be able to identify the faithfulness of a given piece of scientific text automatically. This thesis is concerned with the cultivation of datasets, methods, and tools for machine understanding of scientific language, in order to analyze and understand science communication at scale. To arrive at this, I present several contributions in three areas of natural language processing and machine learning: automatic fact checking, learning with limited data, and scientific text processing. These contributions include new methods and resources for identifying check-worthy claims, adversarial claim generation, multi-source domain adaptation, learning from crowd-sourced labels, cite-worthiness detection, zero-shot scientific fact checking, detecting exaggerated scientific claims, and modeling degrees of information change in science communication. Critically, I demonstrate how the research outputs of this thesis are useful for effectively learning from limited amounts of scientific text in order to identify misinformative scientific statements and generate new insights into the science communication process
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > Maryland > Baltimore (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
- (40 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- (2 more...)
- Media > News (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Consumer Health (1.00)
- (9 more...)
3D Data Long-Term Preservation in Cultural Heritage
Amico, Nicola, Felicetti, Achille
In digital heritage, effective management and preservation of digital data are crucial. Issues such as file corruption, media obsolescence, and inadequate metadata must be addressed, alongside data migration when software becomes outdated and thorough data curation to aid current and future researchers in searching, citing, and reusing historical data. Merely archiving or backing up project data is not enough for long-term preservation. It is essential to ensure that primary data remain reusable, compatible with evolving operating systems, and accompanied by comprehensive metadata detailing their creation and history [1]. Despite the advantage of heritage datasets being "born digital," they are still susceptible to loss if file associations and metadata are not properly maintained. The large volume of data generated from digital projects and the often limited understanding of file associations among project members jeopardise the future reuse of archaeological data if not well-organised or curated. Enhancing workflows to include both metadata authorship and preservation is vital to prevent information loss and digital data obsolescence. Particularly, the long-term preservation of 3D datasets requires maintaining each file in a usable and uncorrupted state. Files undergo several modifications, changing formats during the creation of the final scan or 3D model, known as an asset.
- North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
- Europe > Middle East > Cyprus (0.04)
- North America > United States > Michigan (0.04)
- (10 more...)
- Research Report (1.00)
- Workflow (0.88)
- Education (1.00)
- Information Technology > Security & Privacy (0.92)
- Law (0.68)
Does Medical Imaging learn different Convolution Filters?
Recent work has investigated the distributions of learned convolution filters through a large-scale study containing hundreds of heterogeneous image models. Surprisingly, on average, the distributions only show minor drifts in comparisons of various studied dimensions including the learned task, image domain, or dataset. However, among the studied image domains, medical imaging models appeared to show significant outliers through "spikey" distributions, and, therefore, learn clusters of highly specific filters different from other domains. Following this observation, we study the collected medical imaging models in more detail. We show that instead of fundamental differences, the outliers are due to specific processing in some architectures. Quite the contrary, for standardized architectures, we find that models trained on medical data do not significantly differ in their filter distributions from similar architectures trained on data from other domains. Our conclusions reinforce previous hypotheses stating that pre-training of imaging models can be done with any kind of diverse image data.
- North America > United States > Wyoming > Campbell County (0.04)
- Europe > Switzerland (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Asia > Turkmenistan > Aspheron Ridge (0.04)
Large-scale Evaluation of Transformer-based Article Encoders on the Task of Citation Recommendation
Recently introduced transformer-based article encoders (TAEs) designed to produce similar vector representations for mutually related scientific articles have demonstrated strong performance on benchmark datasets for scientific article recommendation. However, the existing benchmark datasets are predominantly focused on single domains and, in some cases, contain easy negatives in small candidate pools. Evaluating representations on such benchmarks might obscure the realistic performance of TAEs in setups with thousands of articles in candidate pools. In this work, we evaluate TAEs on large benchmarks with more challenging candidate pools. We compare the performance of TAEs with a lexical retrieval baseline model BM25 on the task of citation recommendation, where the model produces a list of recommendations for citing in a given input article. We find out that BM25 is still very competitive with the state-of-the-art neural retrievers, a finding which is surprising given the strong performance of TAEs on small benchmarks. As a remedy for the limitations of the existing benchmarks, we propose a new benchmark dataset for evaluating scientific article representations: Multi-Domain Citation Recommendation dataset (MDCR), which covers different scientific fields and contains challenging candidate pools.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Croatia > Zagreb County > Zagreb (0.04)
- North America > United States > Wyoming > Campbell County (0.04)
- (3 more...)
Towards Personalized and Human-in-the-Loop Document Summarization
The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.
- Oceania > Australia > New South Wales > Sydney (0.14)
- Europe > Czechia > Prague (0.04)
- North America > United States > Wyoming > Campbell County (0.04)
- (24 more...)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Web (1.00)
- Information Technology > Communications > Social Media (1.00)
- (16 more...)
Fashion Landmark Detection and Category Classification for Robotics
Ziegler, Thomas, Butepage, Judith, Welle, Michael C., Varava, Anastasiia, Novkovic, Tonci, Kragic, Danica
Research on automated, image based identification of clothing categories and fashion landmarks has recently gained significant interest due to its potential impact on areas such as robotic clothing manipulation, automated clothes sorting and recycling, and online shopping. Several public and annotated fashion datasets have been created to facilitate research advances in this direction. In this work, we make the first step towards leveraging the data and techniques developed for fashion image analysis in vision-based robotic clothing manipulation tasks. We focus on techniques that can generalize from large-scale fashion datasets to less structured, small datasets collected in a robotic lab. Specifically, we propose training data augmentation methods such as elastic warping, and model adjustments such as rotation invariant convolutions to make the model generalize better. Our experiments demonstrate that our approach outperforms stateof-the art models with respect to clothing category classification and fashion landmark detection when tested on previously unseen datasets. Furthermore, we present experimental results on a new dataset composed of images where a robot holds different garments, collected in our lab.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Aragón (0.04)
- North America > United States > Wyoming > Campbell County (0.04)
- (3 more...)
- Information Technology > Services > e-Commerce Services (0.34)
- Banking & Finance (0.34)